accepted manuscript
Cross-Modal Temporal Fusion for Financial Market Forecasting
Pei, Yunhua, Cartlidge, John, Mandal, Anandadeep, Gold, Daniel, Marcilio, Enrique, Mazzon, Riccardo
Accurate forecasting in financial markets requires integrating diverse data sources, from historical prices to macroeconomic indicators and financial news. However, existing models often fail to align these modalities effectively, limiting their practical use. In this paper, we introduce a transformer-based deep learning framework, Cross-Modal Temporal Fusion (CMTF), that fuses structured and unstructured financial data for improved market prediction. The model incorporates a tensor interpretation module for feature selection and an auto-training pipeline for efficient hyperparameter tuning. Experimental results using FTSE 100 stock data demonstrate that CMTF achieves superior performance in price direction classification compared to classical and deep learning baselines. These findings suggest that our framework is an effective and scalable solution for real-world cross-modal financial forecasting tasks.
Asymmetric Lesion Detection with Geometric Patterns and CNN-SVM Classification
Rasel, M. A., Kareem, Sameem Abdul, Kwan, Zhenli, Faheem, Nik Aimee Azizah, Han, Winn Hui, Choong, Rebecca Kai Jan, Yong, Shin Shen, Obaidellah, Unaizah
Accepted Manuscript: This is the peer - reviewed version of the article accepted for publication in Computers in Biology and Medicine . This manuscript version is made available under the CC BY - NC - ND license. Abstract In dermoscopic images, which allow visualization of surface skin structures not visible to the naked eye, lesion shape offers vital insights into skin diseases. In clinically practiced methods, asymmetric lesion shape is one of the criteria for diagnosing M elanoma. Initially, we labeled data for a non - annotated dataset with symmetrical information based on clinical assessments . Subsequently, we propose a supporting technique -- a supervised learning image processing algorithm -- to analyze the geometrical pattern of lesion shape, aiding non - experts in understanding the criteria of an asymmetric lesion. We then utilize a pre - trained convolutional neural network (CNN) to extract shape, color, and texture features from dermoscopic images for training a multiclass support vector machine (SVM) classifier, outperforming state - of - the - art methods from the literature. In the geometry - based experiment, we achieved a 99.00% detection rate for dermatological asymmetric lesions. In the CNN - based experiment, the best performance is found 9 4% Kappa Score, 95% Macro F1 - score, and 97 % weighted F1 - score for classifying lesion shapes ( A symmetric, H alf - S ymmetric, and S ymmetric). Introduction Dermatological asymmetry, a cornerstone in skin lesion assessment, refers to disparities observed in the shape, size, or color of moles or lesions [1, 2, 3] . In dermatology, careful examination of the lesion shape is critical, especially when it comes to the possibility that lesions are cancerous, such as Melanoma. The dermatological three - point - checklist for early skin cancer detection has showcased remarkable sensitivity in identifying Melanoma [ 2 ]. The presence of " asymmetry of color and structure in one or two perpendicular axes ", stands as the initial criterion of this checklist [ 2 ]. In this method, asymmetry evaluation entails scrutinizing lesions within a plane bisected by two axes set at 90, assigning a score ranging from 0 to 2 based on the number of axes exhibiting asymmetry in shape, color, or structure.
Bluish Veil Detection and Lesion Classification using Custom Deep Learnable Layers with Explainable Artificial Intelligence (XAI)
Rasel, M. A., Kareem, Sameem Abdul, Kwan, Zhenli, Yong, Shin Shen, Obaidellah, Unaizah
Melanoma, one of the deadliest types of skin cancer, accounts for thousands of fatalities globally. The bluish, blue-whitish, or blue-white veil (BWV) is a critical feature for diagnosing melanoma, yet research into detecting BWV in dermatological images is limited. This study utilizes a non-annotated skin lesion dataset, which is converted into an annotated dataset using a proposed imaging algorithm based on color threshold techniques on lesion patches and color palettes. A Deep Convolutional Neural Network (DCNN) is designed and trained separately on three individual and combined dermoscopic datasets, using custom layers instead of standard activation function layers. The model is developed to categorize skin lesions based on the presence of BWV. The proposed DCNN demonstrates superior performance compared to conventional BWV detection models across different datasets. The model achieves a testing accuracy of 85.71% on the augmented PH2 dataset, 95.00% on the augmented ISIC archive dataset, 95.05% on the combined augmented (PH2+ISIC archive) dataset, and 90.00% on the Derm7pt dataset. An explainable artificial intelligence (XAI) algorithm is subsequently applied to interpret the DCNN's decision-making process regarding BWV detection. The proposed approach, coupled with XAI, significantly improves the detection of BWV in skin lesions, outperforming existing models and providing a robust tool for early melanoma diagnosis.
Well-calibrated Confidence Measures for Multi-label Text Classification with a Large Number of Labels
Maltoudoglou, Lysimachos, Paisios, Andreas, Lenc, Ladislav, Martínek, Jiří, Král, Pavel, Papadopoulos, Harris
We extend our previous work on Inductive Conformal Prediction (ICP) for multi-label text classification and present a novel approach for addressing the computational inefficiency of the Label Powerset (LP) ICP, arrising when dealing with a high number of unique labels. We present experimental results using the original and the proposed efficient LP-ICP on two English and one Czech language data-sets. Specifically, we apply the LP-ICP on three deep Artificial Neural Network (ANN) classifiers of two types: one based on contextualised (bert) and two on non-contextualised (word2vec) word-embeddings. In the LP-ICP setting we assign nonconformity scores to label-sets from which the corresponding p-values and prediction-sets are determined. Our approach deals with the increased computational burden of LP by eliminating from consideration a significant number of label-sets that will surely have p-values below the specified significance level. This reduces dramatically the computational complexity of the approach while fully respecting the standard CP guarantees. Our experimental results show that the contextualised-based classifier surpasses the non-contextualised-based ones and obtains state-of-the-art performance for all data-sets examined. The good performance of the underlying classifiers is carried on to their ICP counterparts without any significant accuracy loss, but with the added benefits of ICP, i.e. the confidence information encapsulated in the prediction sets. We experimentally demonstrate that the resulting prediction sets can be tight enough to be practically useful even though the set of all possible label-sets contains more than $1e+16$ combinations. Additionally, the empirical error rates of the obtained prediction-sets confirm that our outputs are well-calibrated.
Content and linguistic biases in the peer review process of artificial intelligence conferences
Vincent-Lamarre, Philippe, Larivière, Vincent
We analysed a recently released dataset of scientific manuscripts that were either rejected or accepted from various conferences in artificial intelligence. We used a combination of semantic, lexical and psycholinguistic analyses of the full text of the manuscripts to compare them based on the outcome of the peer review process. We found that accepted manuscripts were written with words that are less frequent, that are acquired at an older age, and that are more abstract than rejected manuscripts. We also found that accepted manuscripts scored lower on two indicators of readability than rejected manuscripts, and that they also used more artificial intelligence jargon. An analysis of the references included in the manuscripts revealed that the subset of accepted submissions were more likely to cite the same publications. This finding was echoed by pairwise comparisons of the word content of the manuscripts (i.e. an indicator or semantic similarity), which was higher in the accepted manuscripts. Finally, we predicted the peer review outcome of manuscripts with their word content, with words related to machine learning and neural networks positively related with acceptance, whereas words related to logic, symbolic processing and knowledge-based systems negatively related with acceptance.